The search functionality is under construction.

Author Search Result

[Author] Bo LIU(46hit)

41-46hit(46hit)

  • Density Optimization for Analog Layout Based on Transistor-Array

    Chao GENG  Bo LIU  Shigetoshi NAKATAKE  

     
    PAPER

      Vol:
    E102-A No:12
      Page(s):
    1720-1730

    In integrated circuit design of advanced technology nodes, layout density uniformity significantly influences the manufacturability due to the CMP variability. In analog design, especially, designers are suffering from passing the density checking since there are few useful tools. To tackle this issue, we focus a transistor-array(TA)-style analog layout, and propose a density optimization algorithm consistent with complicated design rules. Based on TA-style, we introduce a density-aware layout format to explicitly control the layout pattern density, and provide the mathematical optimization approach. Hence, a design flow incorporating our density optimization can drastically reduce the design time with fewer iterations. In a design case of an OPAMP layout in a 65nm CMOS process, the result demonstrates that the proposed approach achieves more than 48× speed-up compared with conventional manual layout, meanwhile it shows a good circuit performance in the post-layout simulation.

  • A Performance Fluctuation-Aware Stochastic Scheduling Mechanism for Workflow Applications in Cloud Environment

    Fang DONG  Junzhou LUO  Bo LIU  

     
    PAPER

      Vol:
    E97-D No:10
      Page(s):
    2641-2651

    Cloud computing, a novel distributed paradigm to provide powerful computing capabilities, is usually adopted by developers and researchers to execute complicated IoT applications such as complex workflows. In this scenario, it is fundamentally important to make an effective and efficient workflow application scheduling and execution by fully utilizing the advantages of the cloud (as virtualization and elastic services). However, in the current stage, there is relatively few research for workflow scheduling in cloud environment, where they usually just bring the traditional methods directly into cloud. Without considering the features of cloud, it may raise two kinds of problems: (1) The traditional methods mainly focus on static resource provision, which will cause the waste of resources; (2) They usually ignore the performance fluctuation of virtual machines on the physical machines, therefore it will lead to the estimation error of task execution time. To address these problems, a novel mechanism which can estimate the probability distribution of subtask execution time based on background VM load series over physical machines is proposed. An elastic performance fluctuations-aware stochastic scheduling algorithm is introduced in this paper. The experiments show that our proposed algorithm can outperform the existing algorithms in several metrics and can relieve the influence of performance fluctuations brought by the dynamic nature of cloud.

  • Low-Power Loop Parallelization onto CGRA Utilizing Variable Dual VDD

    Bing XU  Shouyi YIN  Leibo LIU  Shaojun WEI  

     
    PAPER-Architecture

      Pubricized:
    2014/11/19
      Vol:
    E98-D No:2
      Page(s):
    243-251

    Coarse Grained Reconfigurable Architectures (CGRAs) are promising platform based on its high-performance and low cost. Researchers have developed efficient compilers for mapping compute-intensive applications on CGRA using modulo scheduling. In order to generate loop kernel, every stage of kernel are forced to have the same execution time which is determined by the critical PE. Hence non-critical PEs can decrease the supply voltage according to its slack time. The variable Dual-VDD CGRA incorporates this feature to reduce power consumption. Previous work mainly focuses on calculating a global optimal VDDL using overall optimization method that does not fully exploit the flexibility of architecture. In this brief, we adopt variable optimal VDDL in each stage of kernel concerning their pattern respectively instead of the fixed simulated global optimal VDDL. Experiment shows our proposed heuristic approach could reduce the power by 27.6% on average without decreasing performance. The compilation time is also acceptable.

  • Compiler Framework for Reconfigurable Computing Architecture

    Chongyong YIN  Shouyi YIN  Leibo LIU  Shaojun WEI  

     
    BRIEF PAPER

      Vol:
    E92-C No:10
      Page(s):
    1284-1290

    Compiler is the most important supporting tool to facilitate the use of reconfigurable computing architecture (RCA). In this paper, a template-based compiler framework is proposed. This compiler can synthesize the executables for RCA from native high-level programming language source code directly. It supports to generate run-time dynamic configuration context. And it is capable to generate both full configuration context and partial configuration context. Experimental results show that the executables generated by the proposed compiler can achieve better execution performance and smaller configuration context size than previous compilers. Moreover, this compiler does not require the programmer to have any extra knowledge about the hardware architecture of RCA.

  • Reconfiguration Process Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications

    Bo LIU  Peng CAO  Min ZHU  Jun YANG  Leibo LIU  Shaojun WEI  Longxing SHI  

     
    PAPER-Computer System

      Vol:
    E95-D No:7
      Page(s):
    1858-1871

    This paper presents a novel architecture design to optimize the reconfiguration process of a coarse-grained reconfigurable architecture (CGRA) called Reconfigurable Multimedia System II ( REMUS-II ). In REMUS-II, the tasks in multi-media applications are divided into two parts: computing-intensive tasks and control-intensive tasks. Two Reconfigurable Processor Units (RPUs) for accelerating computing-intensive tasks and a Micro-Processor Unit (µPU) for accelerating control-intensive tasks are contained in REMUS-II. As a large-scale CGRA, REMUS-II can provide satisfying solutions in terms of both efficiency and flexibility. This feature makes REMUS-II well-suited for video processing, where higher flexibility requirements are posed and a lot of computation tasks are involved. To meet the high requirement of the dynamic reconfiguration performance for multimedia applications, the reconfiguration architecture of REMUS-II should be well designed. To optimize the reconfiguration architecture of REMUS-II, a hierarchical configuration storage structure and a 3-stage reconfiguration processing structure are proposed. Furthermore, several optimization methods for configuration reusing are also introduced, to further improve the performance of reconfiguration process. The optimization methods include two aspects: the multi-target reconfiguration method and the configuration caching strategies. Experimental results showed that, with the reconfiguration architecture proposed, the performance of reconfiguration process will be improved by 4 times. Based on RTL simulation, REMUS-II can support the 1080p@32 fps of H.264 HiP@Level4 and 1080p@40 fps High-level MPEG-2 stream decoding at the clock frequency of 200 MHz. The proposed REMUS-II system has been implemented on a TSMC 65 nm process. The die size is 23.7 mm2 and the estimated on-chip dynamic power is 620 mW.

  • Affine Transformations for Communication and Reconfiguration Optimization of Mapping Loop Nests on CGRAs

    Shouyi YIN  Dajiang LIU  Leibo LIU  Shaojun WEI  

     
    PAPER-Design Methodology

      Vol:
    E96-D No:8
      Page(s):
    1582-1591

    A coarse-grained reconfigurable architecture (CGRA) is typically hybrid architecture, which is composed of a reconfigurable processing unit (RPU) and a host microprocessor. Many computation-intensive kernels (e.g., loop nests) are often mapped onto RPUs to speed up the execution of programs. Thus, mapping optimization of loop nests is very important to improve the performance of CGRA. Processing element (PE) utilization rate, communication volume and reconfiguration cost are three crucial factors for the performance of RPUs. Loop transformations can affect these three performance influencing factors greatly, and would be of much significance when mapping loops onto RPUs. In this paper, a joint loop transformation approach for RPUs is proposed, where the PE utilization rate, communication cost and reconfiguration cost are under a joint consideration. Our approach could be integrated into compilers for CGRAs to improve the operating performance. Compared with the communication-minimal approach, experimental results show that our scheme can improve 5.8% and 13.6% of execution time on motion estimation (ME) and partial differential equation (PDE) solvers kernels, respectively. Also, run-time complexity is acceptable for the practical cases.

41-46hit(46hit)